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Abstract 

In previous work, I have developed an information theoretic 
complexity measure of networks. When applied to several 
real world food webs, there is a distinct difference in com- 
plexity between the real food web, and randomised control 
networks obtained by shuffling the network links. One hy- 
pothesis is that this complexity surplus represents informa- 
tion captured by the evolutionary process that generated the 
network. 

In this paper, I test this idea by applying the same complex- 
ity measure to several well-known artificial life models that 
exhibit ecological networks: Tierra, EcoLab and Webworld. 
Contrary to what was found in real networks, the artificial life 
generated foodwebs had little information difference between 
itself and randomly shuffled versions. 



Introduction 



InlStandish (2005), I developed a method for com puting the 
information complexity of a network. In lStandishl d2010al) . 
I refined and generalised the method to overcome a problem 
with higher complexity values of empty and full networks 
relative to partially filled networks of the same degree, as 
well as taking account of link weights. Coupled with some 
new algorithms for computing automorphism group size, 
this network complexity measure is practical for networks 
of se veral thousand no des. 

In lStandishl (l2010al) . I studied several published datasets 
of natural networks, including a number of foodwebs avail- 
able from the Pajek website, and the neural network of C. el- 
egans (see Table[T|i. In most cases, these networks exhibited 
significantly heightened complexity values compared with 
those of control networks obtained by shuffling the links in a 
random fashion. This leads to the hypothesis that evolution- 
ary processes tend to produce networks with a complexity 
surplus (A) compared with random assembly processes. 

In this work, I apply the same methods to networks cre- 
ated by artificial life evolutio nary system s, in particular the 
interaction network of Tierra (Ray, 1991 ) and the foodwebs 



Complexity as Information 

The notion of using information content as a complexity 
measure is fairly simple. In most cases, there is an ob- 
vious prefix-free representation language within which de- 
scriptions of the objects of interest can be encoded. There 
is also a classifier of descriptions that can determine if two 
descriptions correspond to the same object. This classifier is 
commonly called the observer, denoted O(x). 

To compute the complexity of some object x, count the 
number of equivalent descriptions u(£, x) of length I that 
map to the object x under the agreed classifier. Then the 
complexity of x is given in the limit as I — > oo: 



C(x) = lim £ log AT 

I— >oo 



(1) 



of Ec oLab dStandishlll994 and Webworld ( Caldarelli et al 
1998). 



where TV is the size of the alphabet used for the representa- 
tion language. 

Because the representation language is prefix-free, every 
description y in that language has a unique prefix of length 
s(y). The classifier does not care what symbols appear af- 
ter this unique prefix. Hence u>(£, 0(y)) > N e ~ s ( y \ As £ 
increases, to must increase as fast, if not faster than AT , and 
do so monotonically. Therefore C(0(y)) decreases mono- 
tonically with t, but is bounded below by 0. So equation (Q~|) 
converges. 

To use this formalism with networks, we need to fix two 
things: how to decide when two networks are identical, and 
a prefix-free representation language, which will be used to 
count the representations of a given network. In this con- 
text, ignoring any link weights, two networks are considered 
identical if the nodes of one can be placed over the nodes 
of the second one, such that the links correspond exactly. 
They are topologically identical. We ignore any labels on 
the nodes or links. 

Network bitstring representation 

To represent the network as a bitstring, we need to store the 
node count (n) and link count (I), as well as representation 
of the adjacency matrix. The initial part of the string has 
w = \log 2 n] ' 1' bits, followed by a single '0' stop bit. Fol- 
lowing that are w bits representing the value of n in binary. 
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Table 1: Complexity values of several freely available network datasets, as reported in lStandishl d2010al) . For each network, 
the number of nodes and links are given, along with the computed complexity C. In the fourth column, the original network is 
shuffled 1000 times, and the logarithm of the complexity is averaged ((In Cer))- The fifth column gives the difference between 
these two values, which represents the information content of the specific arrangement of links. The final column gives a 
measure of the significance of this difference in terms of the number of standard deviations ("sigmas") of the distribution of 
shuffled networks. In two examples, the distribution of shuffled networks had zero standard deviation, so oo appears in this 
column. 



'0' bits. We can enumerate the 



Knowing the value of n, the number of bits needed to repre- 
sent I is [log 2 L] , where L = (n(n — l)/2) so I is stored in 
a field of that width. 

For the final part of the string, the linkfield, we can rep- 
resent the adjacency matrix such that a '1' bit in position 
i(n — 1) + j-th represents a link from node i to j if j < i or 
from i to j+1 if j > i, where nodes are numbered . . . n— 1, 
i < n and j < n — 1. However, this representation is not ef- 
ficient — given I, there must be exactly I ' 1 ' bits in the link- 
field, ie it is one of the permutations of / '1' bits and L — I 

L \ 

. 1 permutations, and 

choose the rank of our linkfield in the enumeration as the 
encoding of the linkfield. T his is known as rank encoding 
dMvrvold and RuskevluOOll) . One of the effects of choosing 
this encoding is that both an empty and a full network have 
just one possible linkfield, so will have a rank encoding of 

0, representable in bits, as we already know whether a net- 
work is empty or full from the values of n and I. Hence, the 
full and empty networks are the simplest networks for given 
n and I. 

Weighted links 

Whilst the information contained in link weights might be 
significant in some circumstances (for instance the weights 
of a neural network can only be varied in a limited range 
without changing the overall qualitative behaviour of the 
network), of particular theoretical interest is to consider the 
weights as continuous parameters connecting one network 
structure with another. For instance if a network X has the 
same network structure as A, with b links of weight 1 with a 
network structure B and the remaining a — b links of weight 
w, then we would like the network complexity of X to vary 
smoothly between that of A and B as w varies from 1 to 0. 
Gornerup and Crutchfieldl (120081) introduced a similar mea- 
sure. 

The most obvious way of defining this continuous com- 
plexity measure is to start with normalised weights £\ tu, = 

1. Then arrange the links in weight order, and compute the 
complexity of networks with just those links of weights less 
than w. The final complexity value of a network X = NxL, 
where N is the set of nodes, and L the set of links with as- 
sociated weights wi, 3i E L, is obtained by integrating: 

C(X = N xL)= [ C(N x {i e L : w, < w})dw (2) 
Jo 

Obviously, since the integrand is a stepped function, this is 
computed in practice by a sum of complexities of partial net- 
works. 

Counting the representations 

In principle, one could compute the complexity of a net- 
work by enumerating all bitstrings for a given n and I, and 
counting the number of bitstrings that represent the target 



network. However, this algorithm is highly combinatoric, 
and only really feasible for small networks. However, the 
number of representations can also be computed by dividing 
the total number of possible renumberings of the nodes (Nl) 
by the size of the automo rphism group, for which several 
practical algorithm s exist (iMcKavl 1 198 lc IStandishl 1201 Obt 
DargaetaLLl2008). Even though each of these algorithms 
is NP-complete, in practice they tend to perform quite well 
for networks up to several thousands of nodes. Where each 
algorithm performs poorly, one of the other algorithms per- 
forms well, so a hybrid algorithm that runs each algorithm 
in parallel, and returning the result of the first algorithm to 
complete, performs extremely well. 



ALife models 



Tierra 



Tierra (IRayL 119911) is a well known artificial life system in 
which self reproducing computer programs written in an 
assembly-like language are allowed to evolve. The pro- 
grams, or digital organisms can interact with each via tem- 
plate matching operations, modelled loosely on the way 
proteins interact in real biological systems. A number of 
distinct strategies evolve, including parasitism, where or- 
ganisms make use of another organism's code and hyper- 
parasitism where an organism sets traps for parasites in or- 
der to steal their CPU resources. At any point in time in 
a Tierra run, there is an interaction network between the 
species present, which is the closest thing in the Tierra world 
to a foodweb. 

Tierra is an aging platform, with the last release (v6.02) 
having been released more than six years ago. For this work, 
I used an even older release (5.0), for which I have had some 
experience in working with. Tierra was originally written in 
C for an environment where ints were 16 bits and long ints 
32 bits. This posed a problem for using it on the current gen- 
eration of 64 bit computers, where the word sizes are dou- 
bled. Some effort was needed to get the code 64 bit clean. 
Secondly a means of extracting the interaction network was 
needed. Whilst Tierra provided the concept of "watch bits", 
which recorded whether a digital organism had accessed an- 
other's genome or vice versa, it did not record which other 
genome was accessed. So I modified the template match- 
ing code to log the pair of genome labels that performed the 
template match to a file. 

Having a record of interactions by genotype label, it is 
necessary to map the genotype to phenotype. In Tierra, the 
phenotype is the behaviour of the digital organism, and can 
be judged by running the organisms pairwise in a tourna- 
ment, to see what effect each has on the other. The pre- 
cise de tails for how this can be done is described in lStandish 
d2003h . 

Having a record of interactions between phenotypes, and 
discarding self-self interactions, there are a number of ways 
of turning that record into a foodweb. The simplest way, 



which I adopted, was sum the interactions between each pair 
of phenotypes over a sliding window of 100 million exe- 
cuted instructions, and doing this every 20 million executed 
instructions. This lead to time series of around 2000 food- 
webs for each Tierra run. 

In Tierra, parsimony pressure is controlled by the parame- 
ter SlicePow. CPU time is allocated proportional to genome 
size raised to SlicePow. If SlicePow is close to 0, then there 
is great evolutionary pressure for the organisms to get as 
small as possible to increase their r eplication rate. W hen it is 
one, this pressure is eliminated. In[Stoidish|(l2004b), I found 
that a SlicePow of around 0.95 was optimal. If it were much 
higher, the organisms grow so large and so rapidly that they 
eventually occupy more than 50% of the soup. At which 
point they kill the soup at their next Mai (memory alloca- 
tion) operation. In this work, I altered the implementation 
of Mai to fail if the request was more than than the soup 
size divided by minimum population save threshold (usually 
around 10). Organisms any larger than this will never appear 
in the Genebanker (Tierra's equivalent of the fossil record), 
as their population can never exceed the save threshold. This 
modification allows SlicePow = 1 runs to run for an exten- 
sive period of time without the soup dying. 

EcoLab 

EcoLab was introduced by the author a s a simple model of 
an evolving ecosystem (IStandi sh, 1994). The ecological dy- 
namics is described by an rt-dimensional generalised Lotka- 
Volterra equation: 



(3) 



where n, is the population density of species i, r, its growth 
rate and /3.y the interaction matrix. Extinction is handled via 
a novel stochastic truncation algorithm, rather than the more 
usual threshold method. Speciation occurs by randomly mu- 
tating th ecological parameters (r^ and of the parents, 
subject to the c onstraint that the system remain bounded 
dStandishil2000T) . 

The interaction matrix is a candidate foodweb, but has too 
much information. Its offdiagonal terms may be negative as 
well as positive, whereas for the complexity definition (0, 
we need the link weights to be positive. There are a number 
of ways of resolving this issue, such as ignoring the sign of 
the off-diagonal term (ie taking its absolute value), and an- 
tisymmetrising the matrix by subtracting its transpose, then 
using the sign of the offdiagonal term to determine the link 
direction. 

For the purposes of this study, I chose to subtract just the 
negative terms from itself and its transpose term j3ji. 
This effects a maximal encoding of the interaction matrix 
information in the network structure, with link direction and 
weight encoding the direction and size of resource flow. The 
effect is as follows: 



• Both Pij and f3ji are positive (the mutualist case). Neither 
offdiagonal term changes, and the two nodes have links 
pointing in both directions, with weights given by the two 
offdiagonal terms. 

• Both Pij and (3ji are negative (the competitive case). The 
terms are swapped, and the signs changed to be positive. 
Again the two nodes have links pointing in both direc- 
tions, but the link direction reflects the direction of re- 
source flow. 

• Both fiij and f3ji are of opposite sign (the predator-prey or 
parasitic case). Only a single link exists between species 
i and j, whose weight is the summed absolute values of 
the offdiagonal terms, and whose link direction reflects 
the direction of resource flow. 

Webworld 

Webworld is another evolving ecology m odel, similar in 
some respects to EcoLab, introduced by ICaldarelli et al 



(|1998|) . with some modifications described in iDrossel et al 



(2001 ). It features more realistic ecological interactions than 
does EcoLab, in that it tracks biomass resources. It too has 
an interaction matrix called a functional response in that 
model that could serve as a foodweb, which is converted 
to a directed weighted graph in the same way as the Eco- 
Lab interaction matrix. I used the Webworld implementation 
distributed with the 

Ecq_ ab 

simulation platform Standish 

d2004al) . 

Results 
Methods and materials 

Tierra was run on a 512KB soup, with SlicePow set to 1, un- 
til the soup died, typically after some 5 x 10 10 instructions 
have executed. Some variant runs were performed with Sli- 
cePow^. 95, and with different random number generators, 
but no difference in the outcome was observed. 

The source code of Tierra 5.0 was modified in 
a few places, as described in the Tierra section of 
this paper. The final source code is available as 
tierra.5.0.D7.tar.gz from the 

Ecq_ a b 

website hosted on 

SourceForge (http://ecolab.sf.net). 

The genebanker output was processed by the eco- 
tierra.3.D13 code, also available from the 

Ecq_ ab 

website, 

to produce a list of phenotype equivalents for each genotype. 
A function for processing the interaction log file generated 
by Tierra and producing a timeseries of foodweb graphs was 
added to Eco-tierra. The script for running this postprocess- 
ing step is process_ecollog.tcl. 

The EcoLab model was adapted to convert the interaction 
matrix into a foodweb and log the foodweb to disk every 
1000 time steps for later processing. The Webworld model 




-5000 



Instructions Executed (x 10 ) 



Figure 1: Complexity of the Tierran interaction network for SlicePow=0.95, and A, exaggerated by a factor of 100. Two 
different random number generators were used, Havege and the normal linear congruential generator supplied with Tierra. 




Figure 2: Complexity of the Tierran interaction network for SlicePow=l, and A, exaggerated by a factor of 100. Two different 
random number generators were used, Havege and the normal linear congruential generator supplied with Tierra. 
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Figure 3: Complexity of EcoLab's foodweb, and A, exaggerated by a factor of 100, as described in the text. 




was adapted similarly. The model parameters were as doc- 
umented in the included ecolab.tcl and webworld.tcl exper- 
iment files of the ecolab.4.D37 distribution, which is also 
available from the 

Ecq_ ab 

website. 

Finally, each foodweb, and 100 link-shuffled control ver- 
sions were run through the network complexity algorithm 
(fJJ. This is documented in the cmpERmodel.tcl script of 
ecolab.4.D37. The average and standard deviation of InC 
was calculated, rather than C directly, as the shuffled com- 
plexity values fitted a log-normal distribution better than a 
standard normal distribution. The difference between the 
measured complexity and exp(lnC) (ie the geometric mean 
of the control network complexities) is what is reported as 
A in Figures [T}g] 

Discussion 

It can be seen from FiguresQ~H4] that none of the artificial life 
models studied generate substantially greater network com- 
plexities than do the control networks. By "substantially", I 
mean more than 10% of the total network complexity. The 
complexity difference that exists is nevertheless often statis- 
tically significant, albeit small (of the order of a few bits). 
By contrast, mo st of the 26 practical networks studied in 
Standishl (feOlOal) exhibited substantially greater complexi- 
ties than their controls, the exceptions being the David Cop- 
perfield adjective-noun adjacency dataset (0.98 bits), and the 
C. elegans metabolic network (which at 34.6 bits is about 
0.1% of the total complexity). 

The complete failure for several independent artificial 
evolutionary systems to be able to generate this complex- 
ity surplus weakens the case for the surplus as being due 
to operation of an evolutionary process. It is possible that 
this is another illustration of the difference between arti- 
ficial evolutionary systems and natural evolut ionary sys 



tems observed with Bedau-Packard statistics (Bedauetal 



1998). There is also the possibility that some systematic 



artifact skews the observational data towards more symmet- 
ric networks (which increases complexity values), however 
it seems implausible that networks collected by many dif- 
ferent observers in many different fields should exhibit the 
same systematic error. More work needs to be done applying 
this complexity metric to both artificially evolved networks 
and observational data of naturally evolved networks to elu- 
cidate if this is artifact, or a real phenomenon. 

Conclusion 

In this work, I measured the network complexity of several 
artificially evolved foodwebs to see if I could reproduce the 
complexity surplus seen in empirical network data. In none 
of the artificial systems I studied was the complexity surplus 
substantial enough to be considered a real effect. 



References 

Bedau, M. A., Snyder, E., and Packard, N. H. (1998). A classi- 
fication of long-term evolutionary dynamics. In Adami, C, 
Belew, R., Kitano, H., and Taylor, C, editors, Artificial Life 
VI, pages 228-237, Cambridge, Mass. MIT Press. 

Caldarelli, G., Higgs, P. G., and McKane, A. J. (1998). Modelling 
coevolution in multispecies communities. J. Theor, Biol, 
193:345-358. 

Darga, P. T, Sakallah, K. A., and Markov, I. L. (2008). Faster sym- 
metry discovery using sparsity of symmetries. In Proceed- 
ings of the 45st Design Automation Conference, Anaheim, 
California. 

Drossel, B., Higgs, P. G., and McKane, A. J. (2001). The influ- 
ence of predator-prey population dynamics on the long-term 
evolution of food web structure. J. Theor. Biol, 208:91-107. 

Gornerup, O. and Crutchfield, J. P. (2008). Hierarchical self- 
organization in the unitary process soup. Artificial Life, 
14:245-254. 

McKay, B. D. (1981). Practical graph isomorphism. Congressus 
Numerantium, 30:45-87. 

Myrvold, W. and Ruskey, F. (2001). Ranking and unranking per- 
mutations in linear time. Information Processing Letters, 
79:281-284. 

Ray, T. (1991). An approach to the synthesis of life. In Langton, 
C. G, Taylor, C, Farmer, J. D., and Rasmussen, S., editors, 
Artificial Life II, page 371. Addison- Wesley, Reading, Mass. 

Standish, R. K. (1994). Population models with random embryolo- 
gies as a paradigm for evolution. Complexity International, 
2. 

Standish, R. K. (2000). The role of innovation within economics. 
In Barnett, W., Chiarella, C, Keen, S., Marks, R., and Schn- 
abl, H, editors, Commerce, Complexity and Evolution, vol- 
ume 1 1 of International Symposia in Economic Theory and 
Econometrics, pages 61-79. Cambridge UP. 

Standish, R. K. (2003). Open-ended artificial evolution. Inter- 
national Journal of Computational Intelligence and Applica- 
tions, 3:167. arXiv:nlin.AO/0210027. 

Standish, R. K. (2004a). Ecolab, Webworld and self-organisation. 
In Pollack et al., editors, Artificial Life IX, page 358, Cam- 
bridge, MA. MIT Press. 

Standish, R. K. (2004b). The influence of parsimony and random- 
ness on complexity growth in Tierra. In Bedau et al., editors, 
ALife IX Workshop and Tutorial Proceedings, pages 51-55. 
arXiv:nlin.AO/0604026. 

Standish, R. K. (2005). Complexity of networks. In Abbass et al., 
editors, Recent Advances in Artificial Life, volume 3 of Ad- 
vances in Natural Computation, pages 253-263, Singapore. 
World Scientific. arXivxs.IT/0508075. 

Standish, R. K. (2010a). Complexity of networks (reprise). Artifi- 
cial Life, submitted. arXiv: 0911.348. 

Standish, R. K. (2010b). SuperNOVA: a novel algorithm for graph 
automorphism calculations. Journal of Algorithms - Algo- 
rithms in Cognition, Informatics and Logic, submitted, arXix: 
0905.3927. 



40 
35 
30 
25 
20 
15 
10 
5 




-0.0351403a; eX P ( ^-5 ^ 0.0351403 J 

("0-5 (tM*)*) 



log(z)-3. 67978 Y 



; 39.8 
A 1.40794 



exp i 




Complexity 



40 



45 



50 



55 



GO 




<CT l° w marsh adj. north fork 



Meaning A Meaning B 




